42 research outputs found
Privacy via the Johnson-Lindenstrauss Transform
Suppose that party A collects private information about its users, where each
user's data is represented as a bit vector. Suppose that party B has a
proprietary data mining algorithm that requires estimating the distance between
users, such as clustering or nearest neighbors. We ask if it is possible for
party A to publish some information about each user so that B can estimate the
distance between users without being able to infer any private bit of a user.
Our method involves projecting each user's representation into a random,
lower-dimensional space via a sparse Johnson-Lindenstrauss transform and then
adding Gaussian noise to each entry of the lower-dimensional representation. We
show that the method preserves differential privacy---where the more privacy is
desired, the larger the variance of the Gaussian noise. Further, we show how to
approximate the true distances between users via only the lower-dimensional,
perturbed data. Finally, we consider other perturbation methods such as
randomized response and draw comparisons to sketch-based methods. While the
goal of releasing user-specific data to third parties is more broad than
preserving distances, this work shows that distance computations with privacy
is an achievable goal.Comment: 24 page
Are Two Heads the Same as One? Identifying Disparate Treatment in Fair Neural Networks
We show that deep neural networks that satisfy demographic parity do so
through a form of race or gender awareness, and that the more we force a
network to be fair, the more accurately we can recover race or gender from the
internal state of the network. Based on this observation, we propose a simple
two-stage solution for enforcing fairness. First, we train a two-headed network
to predict the protected attribute (such as race or gender) alongside the
original task, and second, we enforce demographic parity by taking a weighted
sum of the heads. In the end, this approach creates a single-headed network
with the same backbone architecture as the original network. Our approach has
near identical performance compared to existing regularization-based or
preprocessing methods, but has greater stability and higher accuracy where near
exact demographic parity is required. To cement the relationship between these
two approaches, we show that an unfair and optimally accurate classifier can be
recovered by taking a weighted sum of a fair classifier and a classifier
predicting the protected attribute. We use this to argue that both the fairness
approaches and our explicit formulation demonstrate disparate treatment and
that, consequentially, they are likely to be unlawful in a wide range of
scenarios under the US law